VMworld 2007 is coming up, and I'm more excited than usual this time around. Alex Mirgorodskiy, Robert Benson and I will be demonstrating a piece of technology called VProbes.
VProbes was born from an experience that many of you have probably shared: I was wondering why my computer was so slow. I have whiled away many hours over the years investigating this question, and I have rarely gotten a solid answer. Sometimes the slowness is intermittent, and by the time I've developed a theory, it has already disappeared. Other times, the diagnostic tool I want to run (top, or strace, or ltrace, or what have you) isn't installed in the VM, because it's a purpose-built VM with a constrained set of user-land tools. Or, the tools are there, but the system is not in a state to allow me to use them; e.g., it's hung early in boot, and I can't log in to the system. Still other times, I never really scratch the surface of the problem because it's Windows and I don't know my way around.
VProbes attempts to provide a set of tools for answering the question, "What the heck is this computer doing?" It's an open-ended question, so vprobes is accordingly open-ended, as well. In its current form, it provides an interactive, safe way of instrumenting a running VM at any level: from user-level processes down to the kernel, and even into VMware's VMM and hypervisor, if need be.
We also attempt to solve the "unfamiliar tools" problem, by papering over (to the degree possible) the differences among operating systems. There is a broad consensus among Windows and the UNIX flavors about the abstractions that operating systems provide: all major operating systems provide threads, a tree of named processes, named files in a hierarchical namespace with byte contents, sockets, etc. VProbes tries to get by on enough guest-specific knowledge to "wrap" those differences. Thus, for Windows, Linux, and any other operating system that VProbes can be "taught" to understand, the following vprobe script provides a rough and ready approximation of the top UNIX utility:
Guest_TimerIRQ ticks[curprocname()] <- 1;"Big deal," you might be thinking. "I had top already." Well, you didn't have top on Windows. And you didn't have it while the machine was booting, or shutting down. And you certainly didn't have it if the machine in question was a virtual appliance that doesn't even allow logins.
Anyway, if this sounds interesting to you, please consider attending our VMworld breakout session on the 13th, from 3:30 to 4:30, numbered
TA70. I should warn that this is basically a research prototype: we haven't committed to putting VProbes in any upcoming VMware product, and indeed, might never ship it at all. We're putting what we've got in front of the public now because we are at the point where we need feedback from real users to improve VProbes.
In the "credit where it's due" department: we owe an enormous debt in our thinking about this problem to our
colleagues at Sun. I've never hidden my admiration for
DTrace. We hope to improve on it. First, we are aiming to provide a Dtrace-like tool for other commercially important operating systems than Solaris. Second, VProbes can combine with other virtualization-based techniques in powerful ways. For example, VProbes and
deterministic replay combine to make the most potent tool that I'm aware of for debugging intermittent performance anomalies. I'll leave applications to VMotion, snapshots, etc., as tantalizing hints...